negative review
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
Kwon, Deuksin, Shrestha, Kaleen, Han, Bin, Lee, Elena Hayoung, Lucas, Gale
Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.
- North America > United States > California (0.14)
- Europe > Ireland (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (4 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
How Persuasive is Your Context?
Nguyen, Tu, Du, Kevin, Hoyle, Alexander Miserlis, Cotterell, Ryan
Two central capabilities of language models (LMs) are: (i) drawing on prior knowledge about entities, which allows them to answer queries such as "What's the official language of Austria?", and (ii) adapting to new information provided in context, e.g., "Pretend the official language of Austria is Tagalog.", that is pre-pended to the question. In this article, we introduce targeted persuasion score (TPS), designed to quantify how persuasive a given context is to an LM where persuasion is operationalized as the ability of the context to alter the LM's answer to the question. In contrast to evaluating persuasiveness only by inspecting the greedily decoded answer under the model, TPS provides a more fine-grained view of model behavior. Based on the Wasserstein distance, TPS measures how much a context shifts a model's original answer distribution toward a target distribution. Empirically, through a series of experiments, we show that TPS captures a more nuanced notion of persuasiveness than previously proposed metrics.
- Europe > Austria (0.44)
- Europe > United Kingdom (0.28)
- South America > Brazil (0.04)
- (8 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- (5 more...)
Sentiment Analysis of Airbnb Reviews: Exploring Their Impact on Acceptance Rates and Pricing Across Multiple U.S. Regions
This research examines whether Airbnb guests' positive and negative comments influence acceptance rates and rental prices across six U.S. regions: Rhode Island, Broward County, Chicago, Dallas, San Diego, and Boston. Thousands of reviews were collected and analyzed using Natural Language Processing (NLP) to classify sentiments as positive or negative, followed by statistical testing (t-tests and basic correlations) on the average scores. The findings reveal that over 90 percent of reviews in each region are positive, indicating that having additional reviews does not significantly enhance prices. However, listings with predominantly positive feedback exhibit slightly higher acceptance rates, suggesting that sentiment polarity, rather than the sheer volume of reviews, is a more critical factor for host success. Additionally, budget listings often gather extensive reviews while maintaining competitive pricing, whereas premium listings sustain higher prices with fewer but highly positive reviews. These results underscore the importance of sentiment quality over quantity in shaping guest behavior and pricing strategies in an overwhelmingly positive review environment.
- North America > United States > Florida > Broward County (0.26)
- North America > United States > Rhode Island (0.26)
- North America > United States > California > San Diego County > San Diego (0.26)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.89)
Reviews: Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
This paper provides a generalization bound for training over-parameterized deep neural networks with ReLU activation and cross-entropy loss using SGD. Initially the paper received mixed reviews, with two positive and one negative reviews. On the one hand, the analysis is found to be intuitive, general, and potentially influential, the generalization bound is found to be more general and sharper than many existing generalization error bounds for over-parameterized neural networks, and the paper to be very well written. On the other, hand the width requirement is found to be too strict. The rebuttal addressed the issues raised by the reviewers, one rating was increased from 6 to 8, and the negative review updated the score to 6. Upon discussion, the reviewers agreed that the paper should be accepted.
Reviews: Piecewise Strong Convexity of Neural Networks
This paper shows that the quadratic loss with weight decay of deep ReLU networks is piecewise strongly convex on a nonempty open set where every critical point is a local minimum, and every local minimum is isolated. Initially the paper received mixed reviews, with two positive and one negative review. On the positive side, the contribution is found to be quite significant because it analyzes realistic networks (deep and non-linear). On the other hand, one reviewer had issues with the proof, and another with the experiments. The rebuttal addressed the issues raised by the reviewers, and the negative review updated the score.
Review for NeurIPS paper: UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
The submission has received two positive and two negative reviews. The post-rebuttal discussion has not lead to convergence, and the opinion of the reviewers remain split. The concerns of the "negative" reviewers are: 1) The application is too niche (R1). However, the topic of the paper falls into NeurIPS call for papers, as it is related to low-level computer vision, compressed sensing, deep neural architectures. The authors rebut that the results in [55] were cherry-picked and that they use the code from [55], while fixing the parameters.
ChatGPT search tool vulnerable to manipulation and deception, tests show
OpenAI's ChatGPT search tool may be open to manipulation using hidden content, and can return malicious code from websites it searches, a Guardian investigation has found. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. OpenAI has made the search product available to paying customers and is encouraging users to make it their default search tool. But the investigation has revealed potential security issues with the new system.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)
Sentiment Analysis Based on RoBERTa for Amazon Review: An Empirical Study on Decision Making
In this study, we leverage state-of-the-art Natural Language Processing (NLP) techniques to perform sentiment analysis on Amazon product reviews. By employing transformer-based models, RoBERTa, we analyze a vast dataset to derive sentiment scores that accurately reflect the emotional tones of the reviews. We provide an in-depth explanation of the underlying principles of these models and evaluate their performance in generating sentiment scores. Further, we conduct comprehensive data analysis and visualization to identify patterns and trends in sentiment scores, examining their alignment with behavioral economics principles such as electronic word of mouth (eWOM), consumer emotional reactions, and the confirmation bias. Our findings demonstrate the efficacy of advanced NLP models in sentiment analysis and offer valuable insights into consumer behavior, with implications for strategic decision-making and marketing practices.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.48)
- Marketing (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement
Assi, Maram, Hassan, Safwat, Zou, Ying
The exponential growth of the mobile app market underscores the importance of constant innovation and rapid response to user demands. As user satisfaction is paramount to the success of a mobile application (app), developers typically rely on user reviews, which represent user feedback that includes ratings and comments to identify areas for improvement. However, the sheer volume of user reviews poses challenges in manual analysis, necessitating automated approaches. Existing automated approaches either analyze only the target apps reviews, neglecting the comparison of similar features to competitors or fail to provide suggestions for feature enhancement. To address these gaps, we propose a Large Language Model (LLM)-based Competitive User Review Analysis for Feature Enhancement) (LLM-Cure), an approach powered by LLMs to automatically generate suggestion s for mobile app feature improvements. More specifically, LLM-Cure identifies and categorizes features within reviews by applying LLMs. When provided with a complaint in a user review, LLM-Cure curates highly rated (4 and 5 stars) reviews in competing apps related to the complaint and proposes potential improvements tailored to the target application. We evaluate LLM-Cure on 1,056,739 reviews of 70 popular Android apps. Our evaluation demonstrates that LLM-Cure significantly outperforms the state-of-the-art approaches in assigning features to reviews by up to 13% in F1-score, up to 16% in recall and up to 11% in precision. Additionally, LLM-Cure demonstrates its capability to provide suggestions for resolving user complaints. We verify the suggestions using the release notes that reflect the changes of features in the target mobile app. LLM-Cure achieves a promising average of 73% of the implementation of the provided suggestions.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Kingston (0.04)
- (6 more...)
- Overview (1.00)
- Research Report > New Finding (0.93)
- Leisure & Entertainment (0.93)
- Information Technology > Software (0.48)
Evaluating Nuanced Bias in Large Language Model Free Response Answers
Healey, Jennifer, Byrum, Laurie, Akhtar, Md Nadeem, Sinha, Moumita
Pre-trained large language models (LLMs) can now be easily adapted for specific business purposes using custom prompts or fine tuning. These customizations are often iteratively re-engineered to improve some aspect of performance, but after each change businesses want to ensure that there has been no negative impact on the system's behavior around such critical issues as bias. Prior methods of benchmarking bias use techniques such as word masking and multiple choice questions to assess bias at scale, but these do not capture all of the nuanced types of bias that can occur in free response answers, the types of answers typically generated by LLM systems. In this paper, we identify several kinds of nuanced bias in free text that cannot be similarly identified by multiple choice tests. We describe these as: confidence bias, implied bias, inclusion bias and erasure bias. We present a semi-automated pipeline for detecting these types of bias by first eliminating answers that can be automatically classified as unbiased and then co-evaluating name reversed pairs using crowd workers. We believe that the nuanced classifications our method generates can be used to give better feedback to LLMs, especially as LLM reasoning capabilities become more advanced.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (9 more...)